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Real world problems always have different multiple solutions. For instance, optical en¬ 
gineers need to tune the recording parameters to get as many optimal solutions as possible 
for multiple trials in the varied-line-spacing holographic grating design problem. Unfor¬ 
tunately, most traditional optimization techniques focus on solving for a single optimal 
solution. They need to be applied several times; yet all solutions are not guaranteed to 
be found. Thus the multimodal optimization problem was proposed. In that problem, 
we are interested in not only a single optimal point, but also the others. With strong 
parallel search capability, evolutionary algorithms are shown to be particularly effective 
in solving this type of problem. In particular, the evolutionary algorithms for multimodal 
optimization usually not only locate multiple optima in a single run, but also preserve their 
population diversity throughout a run, resulting in their global optimization ability on mul¬ 
timodal functions. In addition, the techniques for multimodal optimization are borrowed 
as diversity maintenance techniques to other problems. In this chapter, we describe and 
review the state-of-the-arts evolutionary algorithms for multimodal optimization in terms 
of methodology, benchmarking, and application. 

1 Introduction and Background 

Since genetic algorithm was proposed by John H. Holland [9] in the early 1970s, researchers 
have been exploring the power of evolutionary algorithms m- For instance, biological pat¬ 
tern discovery [38] and computer vision m- In particular, its function optimization capa¬ 
bility was highlighted |B| because of its high adaptability to different non-convex function 
landscapes, to which we cannot apply traditional optimization techniques. 

Real world problems always have different multiple solutions [451 I46j . For instance, 
optical engineers need to tune the recording parameters to get as many optimal solutions 
as possible for multiple trials in the varied-line-spacing holographic grating design problem 
because the design constraints are too difficult to be expressed and solved in mathematical 
forms I3I|. Unfortunately, most traditional optimization techniques focus on solving for a 
single optimal solution. They need to be applied several times; yet all solutions are not 
guaranteed to be found. Thus the multimodal optimization problem was proposed. In that 
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problem, we are interested in not only a single optimal point, but also the others. Given 
an objective function, an algorithm is expected to find all optimal points in a single run. 
With strong parallel search capability, evolutionary algorithms are shown to be particularly 
effective in solving this type of problem [6j: Given function / : X — )> M, we would like to 
find all global and local maxima (or minima) of / in a single run. Although the objective 
is clear, it is not easy to be satisfied in practice because some problems may have too 
many optima to be located. Nonetheless, it is still of great interest to researchers how 
these problems are going to be solved because the algorithms for multimodal optimization 
usually not only locate multiple optima in a single run, but also preserve their population 
diversity throughout a run, resulting in their global optimization ability on multimodal 
functions. Moreover, the techniques for multimodal optimization are usually borrowed as 
diversity maintenance techniques to other problems |42l [50] . 

2 Problem Definition 

The multimodal optimization problem definition depends on the type of optimization (min¬ 
imization or maximization). They are similar in principle and defined as follows: 

2.1 Minimization 

In this problem, given /:X —)> M, we would like to find all global and local minimums of / 
in a single run. 

Definition 1 Local Minimum |41j : A (local) minimum x; G X of one (objective) func¬ 
tion /:X —)• M is an input element with f{xi) < f{x) for all x neighboring x;. If X G 
we can write: Vx; 3e > 0 : /(x/) < /(x) Vx G X, |x — x/| < e. 

Definition 2 Global Minimum [4T]: A global minimum x^ G X of one (objective) function 
/:X —)■ M is an input element with f{xg) < /(x) Vx G X. 

2.2 Maximization 

In this problem, given /:X —)• M, we would like to find all global and local maximums of / 
in a single run. 

Definition 3 Local Maximum j41| : A (local) maximum x; G X of one (objective) function 
/:X —)■ M is an input element with /(x;) > /(x) for all x neighboring xi. If X G , we 
can write: Vx; 3e > 0 : /(x;) > /(x) Vx G X, |x — x/| < e. 

Definition 4 Global Maximum m- A global maximum x^ G X of one (objective) function 
/:X —^ M is an input element with f{xg) > f{x) Vx G X. 
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3 Methodology 


In the past literature, there are different evolutionary methods proposed for multimodal 
optimization. In this section, we discuss and categorize them into different methodologies. 

3.1 Preselection 

In 1970, the doctorial thesis by Cavicchio introduced different methods for genetic algo¬ 
rithms [3]. In particular, the preselection scheme was proposed to maintain the population 
diversity. In this scheme, the children compete with their parents for survival. If a child 
has a htness (measured by an objective function) higher than its parent, the parent is 
replaced by the child in the next generation. 

3.2 Crowding 

In 1975, the work by De Jong m introduced the crowding technique to increase the chance 
of locating multiple optima. In the crowding technique, each child is compared to a ran¬ 
dom sub-population of cf members in the existing parent population (c/ means crowding 
factor). The parent member which is most similar to the child itself is selected (measured 
by a distance metric). If the child has a higher fitness than the parent member selected, 
then it replaces the parent member in the population. Besides genetic algorithm, Thomsen 
has also incorporated crowding techniques m into differential evolution (CrowdingDE) for 
multimodal optimization m- In his study, the crowding factor is set to the population size 
and Euclidean distance is adopted as the dissimilarity metric. The smaller the distance, 
the more similar they are and vice versa. Although an intensive computation is accom¬ 
panied, it can effectively transform differential evolution into an algorithm specialized for 
multimodal optimization. In 2012, CrowdingDE has been investigated and extended by 
Wong et al , demonstrating competitive performance even when it is compared to the other 
state-of-the-arts methods |49j . 


3.3 Fitness Sharing 

In 1989, Goldberg and Richardson proposed a fitness-sharing niching technique as a diver¬ 
sity preserving strategy to solve the multimodal optimization problem [7]. They proposed 
a shared htness function, instead of an absolute htness function, to evaluate the htness of 
a individual in order to favor the growth of the individuals which are distinct from the 
others. The shared htness function is dehned as follows: 


f'{xi) = Shared Fitness 


Actual Fitness 
Degree of Sharing 


fjxi) 

Ylf=i sh{d{xi,Xj)) 


where f'{xi) is the shared htness of the Rh individual x*; /(xj) is the actual htness of the 
Rh individual x*; d{xi,Xj) is the distance function between the two individuals x* and xj; 
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sh{d) is the sharing function. With this technique, a population can be prevented from 
the domination of a particular type of individuals. Nonetheless, a careful adjustment to 
the sharing function sh{d) definition is needed because it relates the fitness domain f{xi) 
to the distance domain d{xi,Xj) which are supposed to be independent of each other. 

3.4 Species Conserving 

Species conserving genetic algorithm (SCGA) [T7] is a technique for evolving parallel sub¬ 
populations for multimodal optimization. Before each generation starts, the algorithm 
selects a set of species seeds which can bypass the subsequent procedures and be saved into 
the next generation. The algorithm then divides a population into several species based 
on a dissimilarity measure. The httest individual is selected as the species seed for each 
species. After the identification of species seeds, the population undergoes the usual genetic 
algorithm operations: selection, crossover, and mutation. As the operations may remove 
the survival of less ht species, the saved species seeds are copied back to the population at 
the end of each generation. 

To determine the species seeds in a population, the algorithm hrst sorts the population 
in a decreasing fitness order. Once sorted, it picks up the fittest individual as the first 
species seed and forms a species region around it. The next httest individual is tested 
whether it is located in a species region. If not, it is selected as a species seed and another 
species region is created around it. Otherwise, it is not selected. Similar operations are 
applied to the remaining individuals, which are subsequently checked against all existing 
species seeds. 

To copy the species seeds back to the population after the genetic operations have been 
executed, the algorithms need to scan all the individuals in the current population and 
identify to which species they belong. Once it is identihed, the algorithm replaces the 
worst individual (lowest htness) with the species seed in a species. If no individuals can 
be found in a species for replacement, the algorithm replaces the worst and un-replaced 
individual in the whole population. In short, the main idea is to preserve the population 
diversity by preserving the fittest individual for each species. 

3.5 Covariance Matrix Adaptation 

Evolution strategy is an effective method for numerical optimization. In recent years, its 
variant CMA-ES (Covariance Matrix Adaptation Evolution Strategy) showed a remarkable 
success [H]. To extend its capability, niching techniques have been introduced to cope 
with multimodal functions [35]. For instance, a concept called adaptive individual niche 
radius has been proposed to solve the niche radius problem commonly found in speciation 
algorithms |34j . 
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3.6 Multiobjective Approach 

At this point, we would like to note that the readers should not confuse evolutionary 
multimodal optimization (main theme of this chapter) with evolutionary multiobjec¬ 
tive optimization. The former aims at solving a single function for multiple opima, while 
the latter aims at solving multiple functions for pareto front solutions. Nonetheless, the 
techniques involved are related. In particular. Deb and Saha demonstrated that, by decom¬ 
posing a single multimodal objective function problem into a bi-objective problem, they 
can solve a multimodal function using a evolutionary multiobjective optimization algorithm 
[3j. Briefly, they keep the original multimodal objective function as the first objective. On 
the other hand, they use the gradient information to define peaks in the second objective. 

3.7 Ensemble 

As mentioned in the previous section, different niching algorithms have been proposed 
over the past years. Each algorithm has its own characteristics and design philosophy 
behind. Although it imposes difficult conditions to compare them thoroughly, it is a 
double-edged sword. Such a vast amount of algorithms can provide us a “swiss army 
knife” for optimizations on different problems. In particular, Yu and Suganthan proposed 
an ensemble method to combine those algorithms and form a powerful method called 
Ensemble of Niching Algorithms (ENA) [52j. An extension work can also be found in |32j . 

3.8 Others 

Researchers have been exploring many different ways to deal with the problem. Those 
methods include: clearing j29j, repeated iterations [I], species-specific explosion |l3], traps 
m. stochastic automation [25], honey bee foraging behavior [39], dynamic niching 1211 , 
spatially-structured clearing |5] , cooperative artificial immune network [21] , particle swarm 
optimization [ini im El ng , and island model [2|. In particular, Stoean et al. have pro¬ 
posed a topological species conservation algorithm in which the proper topological separa¬ 
tion into subpopulations has given it an advantage over the existing radius-based algorithms 
[38] . Comparison studies were conducted by Singh et al. [36], Kronfeld et al. m, and 
Yu et al. [53]. Though different methods were proposed in the past, they are all based 
on the same fundamental idea: it is to strike an optimal balance between convergence 
and population diversity in order to locate multiple optima simultaneously in a single run 

[371111]. 
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4 Benchmarking 


4.1 Benchmark Functions 

There are many multimodal functions proposed for benchmarking in the past literature. 
In particular, the following five benchmark functions are widely adopted in literature: 
Deb’s 1st function |43] . Himmelblau function [T], Six-hump Camel Back function |24j . 
Branin function [24] , and Rosenbrock function [33]. In addition, five more benchmark 
functions (PPl to PP5) can be found in |43[I23| . For more viogorous comparisons, the IEEE 
Congress on Evolutionary Computation (CEC) usually releases a test suite for multimodal 
optimization every year. More than 15 test functions can be found there [20] . 

4.2 Performance Metrics 

Several performance metrics have been proposed in the past literature [231 IHl HT] HQ] . 
Among them. Peak Ratio (PR) and Average Minimum Distance to the Real Optima (D) 
|23l (33] are commonly adopted as the performance metrics. 

• A peak is considered found when there exists a individual which is within 0.1 Eu¬ 
clidean distance to the peak in the last population. Thus the Peak Ratio is calculated 
using the equation (3): 

„ , r-, ■ Number of peaks found 

Peak Ratio = (3) 

1 otal number of peaks 

• The average minimum distance to the optima (D) is calculated using the equation 
(4): 

n 

y min d{peaki,indiv) 

^ indivGpop 

D = — - (4) 

n 

where n is the number of peaks, indiv denotes a individual, peaki is the Rh peak, pop 
denotes the last population, and d{peaki, indiv) denotes the distance between peaki 
and indiv. 

As different algorithms perform different operations in one generation, it is unfair to set the 
termination condition as the number of generations. Alternatively, it is also unfair to adopt 
CPU time because it substantially depends on the implementation techniques for different 
algorithms. Eor instance, the sorting techniques to find elitists and the programming lan¬ 
guages used. In contrast, fitness function evaluation is always the performance bottleneck 
[28] Thus the number of fitness function evaluations is suggested to be adopted as the 
running or termination condition for convergence analysis. 

^For instance, over ten hours are needed to evaluate a calculation in computational fluid dynamics m 
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4.3 Statistical Tests 


Since evolutionary multimodal optimization is stochastic in nature, multiple runs are 
needed to evaluate each method on each test function. The means and standard deviations 
of performance metrics are usually reported for fair comparison. To justify the results, 
statistical tests are usually adopted to assess the statistical significances. For instance, 
t-tests, Mann-Whitney U-tests (MWU), and Kohnogorov-Smirnov test (KS). 

5 Application 

Holographic gratings have been widely used in optical instruments for aberration correc¬ 
tions. In particular, Varied-Line-Spacing (VLS) holographic grating is distinguished by the 
high order aberration eliminating capability in diffractive optical systems. It is commonly 
used in high resolution spectrometers and monochromaters. A recording optical system of 
VLS holographic grating is outlined in m- 

5.1 Problem Modelling 

The core component descriptions of the optical systems are listed as follows m- 

Ml, M 2 : Two spherical mirrors 
C, D : Two coherent point sources 
G : A grating blank 


In this system, there are two light point sources C and D. They emit light rays which are 
then reflected by mirrors Mi and M 2 respectively. After the reflection, the light rays are 
projected onto the grating blank G. More details are given in [SIlEe]. The objective for 
the design is to find several sets of design variables (or recording parameters m) to form 
the expected groove shape of G (or the distribution of groove density 130!). The design 
variables are listed as follows: 

7 : The incident angle of the ray OiO 

rjc ■ The incident angle of the ray GOi 

6 : The incident angle of the ray O 2 O 

r]j:) : The incident angle of the ray DO 2 

PC '■ The distance between G and Mi {COi) 
qc ■ The distance between Mi and G (OiO) 

Pd ■ The distance between D and M 2 {DO 2 ) 
qd ■ The distance between M 2 and G (O 2 O) 
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Mathematically, the goal is to minimize the definite integral of the square error between 
the expected groove density and practical groove density [3l]: min J = — ne)‘^dw 

where wq is the half-width of the grating, Up is the practical groove density, and Ug is 
the expected groove density. These two groove densities are complicated functions of the 
design variables. Ling et al. have further derived the above formula into a simpler one m- 

j = r‘^+ wli^rir^ + rl) + 2r2r4) 

^3 5 7 


ri = 


Jio 

Aq 


no 


J20 , 

?’2 = V- nQb2 

^0 


rs 


3^30 

2Ao 


mhz , 



no 64 


where jiO) hoj iso and jAo are the functions of the design variables, which are nio, n 2 o, 
nso and n 4 o respectively in [26]. Theoretically, the above objective is simple and clear. 
Unfortunately, there are many other auxiliary optical components in practice, which con¬ 
straints are too difficult to be expressed and solved in mathematical forms. An optimal 
solution is not necessarily a feasible and favorable solution. Optical engineers often need to 
tune the design variables to find as many optimal solutions as possible for multiple trials. 
Multimodal optimization becomes necessary for this design problem. 


5.2 Performance measurements 

As the objective function is an unknown landscape, the exact optima information is not 
available. Thus the previous performance metrics cannot be adopted. We propose two 
new performance metrics in this section. The hrst one is the best fitness, which is the 
fitness value of the fittest individual in the last population. The second one is the number 
of distinct peaks, where a distinct peak is considered found when there exists a individual 
which fitness value is below a threshold 0.0001 and there isn’t any other individual found as 
a peak before within 0.1 Euclidean distance in the last population. The threshold is chosen 
to 0.0001 because the fitness values of the solutions found in m are around this order of 
magnitude. On the other hand, the distance is chosen to 0.1 Euclidean distance because 
it has already been set for considering peaks found in peak ratio [l3l|23|. Nonetheless, it 
is undeniable that such a threshold may not be suitable for this application because the 
landscape is unknown, although the value of 0.1 is the best choice we can adopt in this 
study. 


5.3 Parameter setting 

CrowdingDE-STL [l9|, CrowdingDE-TL |3^, CrowdingDE-SL ^9], Crowding Genetic Al¬ 
gorithm (CrowdingGA) [13], CrowdingDE [JO], Fitness Sharing Genetic Algorithm (Shar- 
ingGA) [7], SharingDE m, Species Conserving Genetic Algorithm (SCGA) [T7], SDE 






Table 1: Results for all algorithms tested on the VLS holographic grating design problem 
(50 runs) 


Measurement 

CrowdingDE-STL l49l 

CrowdingDE-TL 1491 

CrowdingDE-SL 1491 

GrowdingGA [TsI 

CrowdingDE [401 

Mean of Best Fitness 

8.29E-08 

7.17E-07 

1.18E-07 

9.02E-06 

3.66E-06 

StDev of Best Fitness 

2.91E-07 

5.01E-06 

4.04E-07 

3.40E-05 

2.31E-05 

Mean of Peaks Found 

41.42 

45.54 

43.38 

8.94 

41.98 

StDev of Peaks Found 

13.07 

9.00 

10.69 

5.04 

14.26 

Measurement 

SharingGA [tJ 

SharingDE [40| 

SDE [18J 

SCGA [ITJ 

UN [12J 

Mean of Best Fitness 

1.87E+04 

1.74E+02 

1.13E+00 

1.24E+02 

9.19E-04 

StDev of Best Fitness 

6.82E+04 

2.65E+02 

1.56E+00 

4.59E+02 

3.15E-03 

Mean of Peaks Found 

0.0 

0.0 

0.06 

0.02 

7.22 

StDev of Peaks Found 

0.0 

0.0 

0.31 

0.14 

3.92 


|18j . and UN [12] are selected for illustrative purposes in this application. All the algo¬ 
rithms were run up to a maximum of 10000 fitness function evaluations. The above per¬ 
formance metrics were obtained by taking the average and standard deviation of 50 runs. 
The groove density parameters followed the setting in |31) : no = 1.400 x 10^(/ine/mm), 
62 = 8.2453 X 10"^(l/mm), 63 = 3.0015 x 10“^(l/mm2) and 64 = 0.0000 x 
The half-width wq was 90mm. The radii of spherical mirrors Mi and M 2 were 1000mm. 
The recording wavelength (Aq) was 413.Inm. The population size of all the algorithms 
was set to 50. The previous settings remained the same, except the algorithm-specific 
parameters: The species distance of SDE and SCGA was set to 1000. The scaling factor 
and niche radius of SharingDE and SharingGA were set to 1 and 1000 respectively. The 
discount factor of the temporal locality was set to 0.5. The survival selection method of 
the non-crowding algorithms was set to binary tournament | 12 j . 

5.4 Results 

The result is tabulated in Table It can be observed that GrowdingDE-STL can achieve 
the best fitness whereas GrowdingDE-TL can acheive the best number of peaks found. To 
compare the algorithms rigorously, statistical tests have also been used. The results are 
depicted in Figure One can observe that there are some statistically significant perfor¬ 
mance differences among them. In particular, GrowdingDE-based methods are shown to 
have the results statistically different from GrowdingGA, SharingGA, SharingDE, SDE, 
and SGGA. Some configurations obtained after a run of GrowdingDE-STL on this problem 
are depicted in Figure It can be seen that they are totally different and feasible configu¬ 
rations with which optical engineers can feel free to perform multiple trials after the single 
run. 


6 Discussion 


To conclude, we have briefly reviewed the state-of-the-arts methods of evolutionary mul¬ 
timodal optimization from different perspectives in this chapter. Different evolutionary 
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multimodal optimization methodologies are described. To compare them fairly, we de¬ 
scribed different benchmarking techniques such as performance metrics, test functions, 
and statistical tests. An application to Varied-Line-Spacing (VLS) holographic grating 
is presented to demonstrate the real-wold applicability of evolutionary multimodal opti¬ 
mization. Nonetheless, we would like to note several current limitations of evolutionary 
multimodal optimization as well as the possible solutions at the end of this chapter. 

First, most of the past studies just focus on low dimensional test functions for bench¬ 
marking. More high dimensional test functions should be incorporated in the future. Sec¬ 
ond, we would like to point out that evolutionary multimodal optimization is actually 
far from just finding multiple optima because the algorithms for multimodal optimization 
usually not only locate multiple optima in a single run, but also preserve their population 
diversity throughout a run, resulting in their global optimization ability on multimodal 
functions. Moreover, the techniques for multimodal optimization are usually borrowed as 
diversity maintenance techniques to other problems. Third, the computational complex¬ 
ities of the methods are usually very high comparing with the other methods since they 
involve population diversity maintenance which implies that the related survival operators 
need to take into account the other individuals, resulting in additional time complexity. 
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(c) Peaks Found (d) Peaks Found 

Figure 1: For Table we depict the statistical significance test results for the pairwise 
performance differences between all algorithms tested on the VLS holgraphic grating design 
problem by Mann-Whitney U-test (MWU) and Two-sample Kolmogorov-Smirnov test (KS) 
with p=0.05. Each sub-hgure correspond to the performance comparison using a metric 
by a statistical test. The vertical axis is the same as the horizontal axis. Each algorithm is 
represented by a number on each axis. The numbering of the algorithms follows the order 

in Table For instance, 1 refers to CrowdingDE-STL, 2 refers to CrowdingDE-TL.10 

refers to UN. The color of each block represents whether the algorithm indicated by the 
horizontal axis shows a performance different from the algorithm indicated by the vertical 
axis in a statistically significant way. The black color denotes the p-values higher than 
0.05 whereas the white color denotes the p-values lower than 0.05. The even numbered 
sub-figures are the results obtained by MWU, whereas the odd numbered sub-figures are 
the results obtained by KS. 
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Figure 2: Configurations obtained by a single run of CrowdingDE-STL on the VLS holo¬ 
graphic grating design problem. It can be seen that they are totally different and feasible 
configurations with which optical engineers can feel free to perform multiple trials after 
the single run. 
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